Lossy Source Coding via Spatially Coupled LDGM 

Ensembles 



Vahid Aref, Nicolas Macris, Riidiger Urbanke and Marc Vuffray 
LTHC, IC, EPFL, Lausanne, Switzerland, 



Abstract — We study a new encoding scheme for lossy source 
compression based on spatially coupled low-density generator- 
matrix codes. We develop a belief-propagation guided-decimation 
algorithm, and show that this algorithm allows to approach 
the optimal distortion of spatially coupled ensembles. Moreover, 
using the survey propagation formalism, we also observe that the 
optimal distortions of the spatially coupled and individual code 
ensembles are the same. Since regular low-density generator- 
matrix codes are known to achieve the Shannon rate-distortion 
bound under optimal encoding as the degrees grow, our results 
suggest that spatial coupling can be used to reach the rate- 
distortion bound, under a low complexity belief-propagation 
guided-decimation algorithm. 

This problem is analogous to the MAX-XORSAT problem in 
computer science. 

Index Terms — Lossy source coding, spatial coupling, LDGM, 
belief propagation guided decimation, rate distortion bound. 

I. Introduction 

The spatial coupling of copies of a graphical code was intro- 
duced in 111] in the form of convolutional low-density parity- 
check (LDPC) codes. The performance of such ensembles 
under the belief propagation (BP) algorithm is consistently 
better than the performance of the underlying ensembles 
The key observation is that the BP threshold of a coupled en- 
semble considerably improves and gets close to the maximum 
a posteriori (MAP) threshold of the underlying ensemble. This 
threshold saturation phenomenon has been studied rigorously 
m . Furthermore, it has been investigated in other models 
such as the Curie- Weiss chain, random satisfiability, graph 
coloring and compressed sensing 11QL 11111 . 

One of the classic problems in communications is lossy 
source compression. The objective is to compress a given 
sequence, so that it can be reconstructed up to some spec- 
ified distortion. For binary symmetric sources, low-density 
generator-matrix (LDGM) codes are able to asymptotically 
achieve Shannon's rate-distortion bound under optimal encod- 
ing (minimum distance encoding) 112, [Isll . However, this is 
not a computationally efficient scheme. 

LDGM codes are well-suited to low-complexity message 
passing algorithms such as the BP algorithm. But using LDGM 
codes with a plain BP algorithm is not very effective in 
lossy compression. To achieve more promising results, one 
can equip the BP algorithm with a decimation process. The 
general idea of belief-propagation guided-decimation (BPGD) 
algorithms is to: i) compute BP marginals, ii) fix bits with the 
largest bias, iii) decimate the graph. Decimation reduces the 
graph to a smaller one, on which this process is repeated. Many 
variants of message passing algorithms and various decimation 



processes have been investigated. In 111411 a survey propagation 
(SP) inspired decimation algorithm is proposed. Simulations 
show distortions close to the rate-distortion bound for large 
block-lengths. Later, similar results were reported with modi- 
fied forms of BP ll5Lll6l1 . The performance of these algorithms 
also depends on the choice of degree distributions for LDGM 
codes. A heuristic choice is to use degree distributions that 
have been optimized for LDPC codes on binary symmetric 
channel 1114141611 . No rigorous analysis exists to date which 
can explain why this is the case. 

In the present contribution, we study a spatially coupled 
LDGM ensemble for lossy source compression. Two of us 
II 1 711 studied such ensembles in the context of rateless codes 
(channel coding) and demonstrated that threshold saturation 
takes place. We provide numerical evidence showing that a 
similar effect occurs in lossy compression. In particular, the 
BPGD distortion of the coupled LDGM ensemble approaches 
(numerically) the optimal distortion of the underlying en- 
semble. We use the simplest forms of BP and decimation 
processes, and we take regular degrees. It is noteworthy that 
no optimization is needed, thus suggesting that the observed 
saturation is related to fundamental principles. Regular (under- 
lying) LDGM ensembles with large degrees have an optimal 
distortion that approaches the rate-distortion limit. Thus with 
spatially coupled LDGM codes with regular large degrees 
and a BPGD algorithm we can attain the Shannon limit. The 
complexity of the encoding scheme presented here is 0(n 2 L 2 ) 
where L is the number of copies of the underlying ensemble 
and n the size of each copy. 

In section HH we briefly review lossy compression and ex- 
plain the structure of coupled LDGM ensembles. In section HTI1 
we formulate lossy compression as an optimization problem 
and compute the optimal distortion for underlying and coupled 
LDGM ensembles, by using the SP formalism. In section [IVl 
we formulate and discuss the BPGD algorithm. Simulation 
results for this algorithm are presented in section [Vj A few 
practical issues are discussed in section [VU 

II. Framework 

1 ) Lossy Compression of Symmetric Bernoulli Sources: 
Let X_ G X = {0,l} n represent a binary source of length n. 
We have X = {X u X 2 , • • • , X n }, where {X a }™ =1 are i.i.d 
random variables with ¥{X a = 1} = |, for a G {1, . . . , n}. 
We compress a given source word x by mapping it to one of 
the 2 nR index words u G U = {0, l} nR , where R G [0, 1] is 
the rate. The stored sequence u then defines a reconstructed 
source sequence x_(u) G X, where the source decoding map 



Fig. 1. The factor graph associated to an underlying LDGM code. 



W = 2 



u —> x_(u) depends on the structure of the code. For a given 
pair (x,x), the distortion is measured by Hamming distance 
^d H (x,x) = ± YZ=i \ x a -x a \. Thus, the average quality of 
the reconstruction is measured by D := -Ex (dn(x, x)). 

For the symmetric Bernoulli source, it is well-known that for 
any scheme, the average distortion is lower-bounded by D S h, 
defined implicitly by the rate-distortion function h(D s ^) = 
1 — R, D S h G [0, |]. Here h(-) is the binary entropy function. 
The goal is to find an encoding scheme such that, for a given 
R, it achieves the rate-distortion lower bound. 

For this purpose, we study LDGM codes since they are 
able to achieve the rate-distortion bound under the optimal 
encoding when the average degrees increase fl2h . 

2) LDGM Codes: Consider an LDGM code of block length 
n and rate R = ^. Let i/i, • • • , u m be m code-bits of the 
index word u and let x\ , X2 , • • • , x n be n reconstruction bits 
of the source word xi, X2, • • • , x n . An LDGM code is usually 
represented by a bipartite graph as depicted in Figure Q] 
Call this graph T(C, G; E). Each code-bit ui is represented 
by a node i G C(T). For each reconstruction bit x a , there 
is a generator node a G G(T). Each edge (i,a) G -S(r) 
shows that the code-bit is connected to the corresponding 
reconstruction bit x a . We denote by da the set of all code- 
bit nodes connected to a G G, i.e. da = {i G C| (i, a) e E}. 
Similarly, for i G C, di = {a G G\ (i,a) G £}. Thus, the 
reconstructed bit x a is obtained from the code-bits u by a 
mod 2 sum x a = (BiedaUi. 

In this paper, we focus on the (/, r, n) -regular LDGM 
random ensemble, where I (resp. r) is the degree of generator 
(resp. code-bit) nodes. We refer to 118] 

for the detailed 

construction of this ensemble. 

3) Chain of LDGM(l , r n) Ensembles: Consider L sets of 
nodes each having m = code-bit nodes and n generator 
nodes (and reconstruction nodes). Locate the sets in positions 
to L— 1 on a circle (see FigureO. We randomly connect each 
generator node in position z G [0, L — 1], to / code-bit nodes 
from w sets in the range [z, z + w — 1]. Eventually (for large 
n and m) code-bit nodes have degree r. The details of this 
construction are explained in Q,[l7|]. Note that this ensemble 
has the same local structure as the underlying LDGM(/,r) 
ensemble. Because of the global circular structure, we call it 
a Closed Chain LDGM(l,r, L,w,n) ensemble (CCLDGM). 

We wish to point out a difference between the present 
construction and the ones used so far in channel coding. The 
latter are based on open linear chains with suitable boundary 
conditions that help initiate the decoding process, which then 
propagates like a wave through the system. In the present 



Fig. 2. Illustration of a factor graph from a CCLDGM ensemble with length 
L = 8 and w = 2. 



periodic construction, encoding will be seeded by preferential 
decimation at a specific position (say z = L/2), in an initial 
phase of the BPGD algorithm. 

III. Optimal Encoding and Distortion 

A. Optimal Encoding 

Let T be the factor graph of a random instance of an 
LDGM(/, r, n) ensemble or CCLDGM (Z, r,L,w,n) ensemble. 
We are looking for a configuration u* that minimizes the 
distortion between x and x_(u), i.e. 

dmin := min ^—d H (x,x) \d H (x, ®ied a u*) . (1) 

Note that for the individual ensemble L = 1. As we are 
interested in the average distortion of the ensemble over all 
X_ G X, we define the optimal distortion as 



D pt = Er,x(^min). 



(2) 



We transform the above minimization problem into a maxi- 
mum likelihood problem by equipping the configuration space 



{0,1} 



nLl/r 



with the following probability measure, 



lip{u | x) := 



1 



-/3cIh(x,x.) 



Z(f3 | x) 
1 

Z{fi | x 



n 



-P\x a - 



(3) 



where f3 G 



aeC{T) 

is a non-negative parameter, and 



Z(f3\x) = ^2 u6 -l 3d H(x,x) normalizing factor. 

The minimizer u* in (Q]) maximizes the measure, 
i.e, max^/i/? (u \ x) = /up (u* \ x). Moreover, for a 
particular x, the minimal distortion can be obtained as 
^min = — (nL) -1 lim^^oo In Z (f3 \ x) . This formulation 
of the problem allows to use techniques from statistical 
physics to compute D pt- In the physics interpretation, dn 
and up are a random hamiltonian and Gibbs measure, /3 is 
an inverse temperature, Z a partition function, D opt is the 
average ground state energy. The latter can be computed by 
the SP formalism. 

B. Computation of the Optimal Distortion 

The details of the SP equations, used to compute D ovU 
are not shown here. A pedagogical account of the formalism 
can be found in Il9ll . The numerical solution of the SP 
equations shows that the optimal distortions for the underlying 



TABLE I 

The optimal and the BPGD distortions for the LDGM(fc, 2k) 

ENSEMBLES. THE BPGD DISTORTIONS ARE EVALUATED FOR n = 200000 

NODES. The rate-distortion bound for R = 0.5 IS D SH ~ 0.1100. 



(k,2k) 


(3,6) 


(4,8) 


(5, 10) 




0.1139 
0.1357 


0.1111 
0.1590 


0.1105 
0.1811 



LDGM(/,r) and CCLDGM(7, r, L, w) are the same for each 
finite w and L (here n and m are infinite). In fact, for an 
ensemble with even / and Poisson degree for code-bit nodes, 
a rigorous proof is presented in [ 9] where the spatially coupled 
periodic XORSAT problem is discussed. Optimal distortions 
for degree distributions (/c,2/c), k = 3,4 and 5 are given in 
the first line of Table U We observe that, as k increases, the 
optimal distortion converges to the rate-distortion bound D^. 
This observation is consistent with the result in 

IV. Belief Propagation Guided Decimation 

A. Belief Propagation Approximation 

The naive way to compute the partition function involves 
the summation over 2 nLl / r terms. Unfortunately, this approach 
is too complex. The same problem occurs for computing the 
marginal distribution of a node. On a tree, however, these 
computations can be done exactly and yield the BP equations. 
We deal with graphs that are locally tree-like and this mo- 
tivates the use of the BP equations as a first approximation 
for the marginals. These involve a set of 2 \E (Y)\ real valued 
messages, each pair being associated to an edge. The messages 
on the edge (i, a) G C (Y) x G (Y) are denoted by f]i^ a and 
rj a ^i, and satisfy 

tia-+i = ~ tanrr 1 I (-1)*- tanh(|) ]J tanh (/3^ a ) J 
P \ jeda\i J 

Vi^a = ^2 Vb-+i- ( 4 ) 

b£di\a 

As a side remark, note that as (3 — > oo the BP equations 
become the so-called Min-Sum equations. From the solutions 
of equations © one can compute the BP marginal distributions 
of code-bit i and generator node a, and a Bethe approximation 
Hp P (u\x) (this is not a probability measure in general) of 
the original measure /jL/3(u\x). We refer to [19] for more 
information. 

Unfortunately, the number of solutions of the BP equations 
which lead to roughly the same distortion, grows exponentially 
large in terms of n, and one cannot find the relevant fixed point 
by a plain iterative method. To resolve this problem, the BP 
iterations are equipped with a heuristic decimation process. 
This forms the basis of the BPGD algorithm. 

B. BPGD algorithm 

In this section we consider an instance of CCLDGM 
(Z,r, L,w,n) ensemble. Equations © will be solved itera- 
tively starting from the initial conditions r)f\ a = rf^\i = ®- 



At iteration t the messages are r)f\ a and rf*\i- ^ e define the 
bias of a code-bit i at time t as 

M»)=^^, (5) 

This represents the tendency of the code-bit to be or 1. Our 
BPGD algorithm shown in Algorithm [T] uses a decimation 
condition. Let e > a small quantity, a > a large quantity 
and T an iteration time. Typical values used in the simulations 
are e = 0.01, a = 4.25 and T = 10. We say that the 
decimation condition is fulfilled if one of following occurs: 

i) After some time r(e) < T, the messages do not change 
significantly in two successive iterations, in the sense that 

E(i,a)es(r) ~ Va^\ < e for t > r(e). 

ii) For some i and t < T, \b t (i)\ > a. 

iii) None of the above conditions has taken place t <T. 



Algorithm 1 BPGD Algorithm 

1) Begin with the graph instance T^ ) G CCLDGM 
(7,r,L, w,n). 

2) t = 0, Update equations © until the first time t\ such 
that the decimation condition is fulfilled. 

3) Find B = max^ \b tl 

4) If B = 0, then randomly pick a code-bit i from range 
[^p, and fix it randomly to or 1. 

Else pick a code-bit i G {j G C (T^) | \b tl (j) \ = B) 
randomly and fix U{ = |(sign(6 tl (i)) + 1). 

5) Update xi /c+1 ^ ) = — Ui, for a G di, otherwise 
x"a +1) = x { a k) . Then, update r^ +1 ) = r( fe )\{i}. 

6) If there exists an unfixed code-bit, then go to (2), 
Else finish and return u. 



It is not difficult to see that the complexity of the algorithm 
is 0(n 2 L 2 ). Note that after each decimation, rather than 
resetting messages to zero we continue with the previous 
messages. 

An important element in the algorithm is that when there is 
no significant bias (step (4)) the random decimation occurs in 
a specific bounded interval centered around L/2 and of size w. 
We observe that this creates a seed from which the encoding 
process starts propagating through the ring. The usual hard 
decimation algorithms randomly choose (when there is no 
bias) a code-bit from the whole graph. We observed that 
when this prescription is used for CCLDGM instances, the 
distortion does not improve with respect to that of usual 
LDGM instances. 

C. Optimal f3 and Relation to BSC 

We have seen that a configuration u* maximizes the prob- 
ability up (u) for all f3 > 0. If ^ was an accurate approx- 
imation of up, we would expect the maximum of H^p{u) t0 
be independent of f3. But H^p is not an exact description of 
Hp. It is then natural to look at the performance of the BPGD 
procedure for different /J and find an optimal parameter /3 opt . 

Figure [3] illustrates the performance of BPGD versus f3 for 
CCLDGM(3, 6, 64, 3, 2000) and LDGM(3, 6, 128000) ensem- 
bles. For this example, the first observation is that, for all 




D opt = 0.1139 



5 10 15 20 

Fig. 3. BPGD distortion versus (3 for CCLDGM(3, 6, 64, 3, 2000) (bottom) 
and LDGM(3, 6, 128000) (top) ensembles. 



TABLE II 

Saturated (bold) and average BPGD distortion (parentheses) 
FOR CCLDGM(3, 6, L, w, n). 



L w 




n 






2000 


4000 


8000 


32 2 
3 


0.1182 (0.1204) 
0.1171 (0.1205) 


0.1170 (0.1191) 
0.1161 (0.1195) 


0.1162 (0.1183) 
0.1157 (0.1190) 


64 2 

3 


0.1185 (0.1195) 
0.1175 (0.1191) 


0.1172 (0.1183) 
0.1166 (0.1182) 


0.1164(0.1174) 
0.1159 (0.1175) 



2 0.1186 (0.1191) 0.1173 (0.1179) 

3 0.1176 (0.1184) 0.1167 (0.1175) 



f3 > 0, the coupled ensemble has smaller D>bpgd than the 
uncoupled ensemble. We also observe that each ensemble has 
an optimal parameter /3 opt which lies between | and ~. In 
order to make comparison between coupled and uncoupled 
ensembles and as the distortion does not vary much when 
P e [§,§], we fix /3 = 2 in the rest of the paper. Table U 
shows that the BPGD distortion - for the value f3 = 2 - is still 
consistently higher than D opt . 

The measure /i# also arises in the context of channel coding 
over a memory less binary symmetric channel (BSC). Suppose 
that we use the LDGM code with factor graph T over a BSC 
with flipping probability p. Then the posterior probability that 
the un-coded message u G F^^ r is sent given the received 
codeword x G FJ can be formally expressed just as in © 
with j3 —> In The difference between channel coding 

and lossy source coding lies in the distribution of X. In lossy 
source coding, {X a } are n i.i.d. Ber(l/2) variables; whereas 
in channel coding over a BSC, they are not in general i.i.d. 
and depend on the flipping probability p (or f3). 

V. Distortion Saturation in a CCLDGM Ensemble 

We provide simulation results which convincingly show that 
the BP distortion of a CCLDGM ensemble closely approaches 
the optimal distortion of the underlying LDGM ensemble as 
n, L and w grow large. To emphasize the effect of coupling, 
we consider the family of (fc, 2 k) -regular LDGM ensembles. 
This family is known to yield a weak performance under the 
BPGD algorithm Hi. Indeed, although the optimal distortion 
is rather close to the rate-distortion bound and improves as k 
grows, the BPGD distortion becomes larger as k grows (see 
Table U). Analogous behaviors are well known to occur in 
channel coding 11811 . 

Now consider a CCLDGM (/c, 2k,L,w) ensemble. Let D z 
denote the average distortion of the reconstruction nodes lying 
at position z. Call D z the local distortion at position z. Thus, 
the total average distortion is D = ^2 z Zo D z . Call the 
vector (.Do, Di, • • • , .Dl-i) the distortion profile. 

When we apply the BPGD algorithm, we observe a generic 
behaviour in the distortion profile. Figure 01 illustrates the 
distortion profile of CCLDGM (5, 10, L, w, n) ensembles for 
different pairs of (L,w) and n = 2000 . The profile is 
symmetric around — (this expected in view of the algorithm). 
Consider the first ^ components. In the range [0,w — 1], the 



TABLE III 

Saturated (bold) and average BPGD distortion (parentheses) 
FOR CCLDGM(4, 8, L, w, n). 



L 


w 




n 








2000 


4000 


8000 


32 


2 
3 
4 


0.1165 (0.1205) 
0.1146 (0.1212) 
0.1135 (0.1226) 


0.1148 (0.1188) 
0.1133 (0.1199) 
0.1125 (0.1215) 


0.1134 (0.1175) 
0.1124 (0.1189) 
0.1118 (0.1208) 


64 


2 
3 
4 


0.1170 (0.1190) 
0.1153 (0.1186) 
0.1144 (0.1189) 


0.1151 (0.1171) 
0.1139 (0.1172) 
0.1133 (0.1177) 


0.1138 (0.1157) 
0.1130 (0.1161) 
0.1125 (0.1169) 


128 


2 
3 
4 


0.1172 (0.1182) 
0.1156 (0.1172) 
0.1148 (0.1170) 


0.1153 (0.1162) 
0.1141 (0.1157) 
0.1136 (0.1158) 





local distortion is strictly decreasing. It starts at a value roughly 
equal to D>bpgd of the underlying ensemble and decreases to 
a value that we call the saturation value. Then, in the range 
[w, |r], the local distortions nearly remain constant and equal 
to the saturation value. We refer to the first interval and the 
second interval respectively as the unsaturated part and the 
saturated part. Therefore, the average distortion considerably 
decreases in the coupled ensembles. 

Tables HH (TTTJ and [IVl show the saturation value (in bold) 
and the average distortion (in parentheses) for k = 3, 4, 5 and 
different values of n, L and w. Each value is averaged over 100 
random code instances and random source words. Inspection 
of the tables suggests that it approaches D ovt in the regime 
n ^> L ^> w ^> 1. The average distortion D>bpgd converges 
to the saturation value by increasing L. The reason is the 
unsaturated part essentially does not change for fixed fc, w and 
n (see Figure 0]), thus its contribution in the average distortion 
vanishes in the large L limit. As a result, we expect that D> B pgd 
gets very close to for a large enough n ^> L ^> w ^> 1 
and the optimal f3. Moreover, if k grows, D opt converges to the 
rate-distortion bound. Thus, the D>bpgd of a regular CCLDGM 
ensemble can get close to the rate-distortion bound. 

As mentioned before, the coupled chains used in channel 
coding are terminated by suitable boundary conditions and this 
incurs a rate loss of order O(^). In CCLDGM ensembles, we 
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TABLE IV 

Saturated (bold) and average BPGD distortion (parentheses) 
FOR CCLDGM(5, 10, L, w, n). 



the complexity to 0(n 2 L). Finally the use of soft decimation 
techniques might also help improve the overall performance. 



L w 







2000 


4000 


8000 


32 


2 
3 
4 
5 


0.1175 (0.1229) 
0.1147 (0.1236) 
0.1134 (0.1256) 
0.1124 (0.1280) 


0.1153 (0.1207) 
0.1130 (0.1219) 
0.1120 (0.1242) 
0.1112 (0.1268) 


0.1138 (0.1192) 
0.1120 (0.1208) 
0.1111 (0.1234) 
0.1107 (0.1262) 


64 


2 
3 
4 
5 


0.1181 (0.1208) 
0.1155 (0.1199) 
0.1144 (0.1205) 
0.1137 (0.1213) 


0.1158 (0.1185) 
0.1138 (0.1182) 
0.1130 (0.1190) 
0.1124 (0.1200) 


0.1140 (0.1167) 
0.1125 (0.1169) 
0.1119 (0.1179) 
0.1116 (0.1192) 


128 


2 
3 
4 
5 


0.1182 (0.1196) 
0.1158 (0.1180) 
0.1148 (0.1178) 
0.1141 (0.1179) 


0.1158 (0.1172) 
0.1140 (0.1161) 
0.1133 (0.1162) 
0.1128 (0.1166) 





do not pay this cost in the graph structure, but it manifests 
itself in the large local distortion in the 2w positions of the 
unsaturated part. 

VI. Conclusion 

We have observed that CCLDGM (/c, 2k 1 L 1 w) ensembles 
can asymptotically saturate the rate-distortion bound under a 
BPGD algorithm. Degrees (k,2k) have been chosen because 
it is known they lead to weaker results in the uncoupled 
case, but the saturation presumably occurs for a large class of 
irregular LDGM ensembles. As a result, using a right-regular 
CCLDGM ensemble with Poisson distribution on the code-bit 
nodes, would allow to achieve the rate-distortion bound for all 
rates R e (0, 1). 

There are clearly a number of features of the algorithm that 
can be improved. Two important issues are its convergence 
rate, and its complexity. To address the first one, we can fix to 
zero the code-bits lying in range [ ( L ~ w ) ? and then, we 

remove the reconstruction nodes (and source nodes) lying in 
ranges [0, tt] and [L — 1 — ^,L — 1]. The total rate of the code 
slightly increases but the convergence rate is improved. The 
complexity of the BPGD algorithm used here is 0(n 2 L 2 ). To 
reduce it one could use a "sliding window encoding" similar to 
the one used for coupled LDPC codes [3] . This would reduce 
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